Search CORE

89 research outputs found

KnowBots:Discovering Relevant Patterns in Chatbot Dialogues

Author: A Kerly
BA Shawar
C Chakrabarti
C Mooney
F Herrera
H Shah
J Jia
J Pereira
JY Chai
KB Shah
M Souza
P Fournier-Viger
P Fournier-Viger
P Fournier-Viger
W Duivesteijn
Publication venue: Springer
Publication date: 16/10/2019
Field of study

Crossref

University of Twente Research Information

Prefix-Projection Global Constraint for Sequential Pattern Mining

Author: B Negrevergne
G Pesant
G Yang
MJ Zaki
MN Garofalakis
N Beldiceanu
P Fournier-Viger
T Guns
Publication venue
Publication date: 23/06/2015
Field of study

Sequential pattern mining under constraints is a challenging data mining task. Many efficient ad hoc methods have been developed for mining sequential patterns, but they are all suffering from a lack of genericity. Recent works have investigated Constraint Programming (CP) methods, but they are not still effective because of their encoding. In this paper, we propose a global constraint based on the projected databases principle which remedies to this drawback. Experiments show that our approach clearly outperforms CP approaches and competes well with ad hoc methods on large datasets

arXiv.org e-Print Archive

HAL - Normandie Université

Crossref

Mining Partially-Ordered Sequential Rules Common to Multiple Sequences

Author: Cao L
Fournier-Viger P
Nkambou R
Tseng VS
Wu CW
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2015
Field of study

© 2015 IEEE. Sequential rule mining is an important data mining problem with multiple applications. An important limitation of algorithms for mining sequential rules common to multiple sequences is that rules are very specific and therefore many similar rules may represent the same situation. This can cause three major problems: (1) similar rules can be rated quite differently, (2) rules may not be found because they are individually considered uninteresting, and (3) rules that are too specific are less likely to be used for making predictions. To address these issues, we explore the idea of mining "partially-ordered sequential rules" (POSR), a more general form of sequential rules such that items in the antecedent and the consequent of each rule are unordered. To mine POSR, we propose the RuleGrowth algorithm, which is efficient and easily extendable. In particular, we present an extension (TRuleGrowth) that accepts a sliding-window constraint to find rules occurring within a maximum amount of time. A performance study with four real-life datasets show that RuleGrowth and TRuleGrowth have excellent performance and scalability compared to baseline algorithms and that the number of rules discovered can be several orders of magnitude smaller when the sliding-window constraint is applied. Furthermore, we also report results from a real application showing that POSR can provide a much higher prediction accuracy than regular sequential rules for sequence prediction

OPUS - University of Technology Sydney

IMSR_PreTree: an improved algorithm for mining sequential rules based on the prefix-tree

Author: D Lo
E Baralis
J Pei
K Gouda
MJ Zaki
P Fournier-Viger
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A Knowledge Discovery Framework for Learning Task Models from User Interactions in Intelligent Tutoring Systems

Author: J. Pei
J. Wang
M.J. Zaki
P. Fournier-Viger
R. Agrawal
S.B. Blessing
Y. Hirate
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Domain experts should provide relevant domain knowledge to an Intelligent Tutoring System (ITS) so that it can guide a learner during problemsolving learning activities. However, for many ill-defined domains, the domain knowledge is hard to define explicitly. In previous works, we showed how sequential pattern mining can be used to extract a partial problem space from logged user interactions, and how it can support tutoring services during problem-solving exercises. This article describes an extension of this approach to extract a problem space that is richer and more adapted for supporting tutoring services. We combined sequential pattern mining with (1) dimensional pattern mining (2) time intervals, (3) the automatic clustering of valued actions and (4) closed sequences mining. Some tutoring services have been implemented and an experiment has been conducted in a tutoring system.Comment: Proceedings of the 7th Mexican International Conference on Artificial Intelligence (MICAI 2008), Springer, pp. 765-77

arXiv.org e-Print Archive

CiteSeerX

Crossref

HAL Clermont Université

Techniques for Complex Analysis of Contemporary Data

Author: Agrawal R.
Batko M.
Bodon F.
Ciaccia P.
Fournier-Viger P.
Han J.
MathWorks I.
Schubert E.
Zezula P.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

Contemporary data objects are typically complex, semi-structured, or unstructured at all. Besides, objects are also related to form a network. In such a situation, data analysis requires not only the traditional attribute-based access but also access based on similarity as well as data mining operations. Though tools for such operations do exist, they usually specialise in operation and are available for specialized data structures supported by specific computer system environments. In contrary, advance analyses are obtained by application of several elementary access operations which in turn requires expert knowledge in multiple areas. In this paper, we propose a unification platform for various data analytical operators specified as a general-purpose analytical system ADAMiSS. An extensible data-mining and similarity-based set of operators over a common versatile data structure allow the recursive application of heterogeneous operations, thus allowing the definition of complex analytical processes, necessary to solve the contemporary analytical tasks. As a proof-of-concept, we present results that were obtained by our prototype implementation on two real-world data collections: the Twitter Higg's boson and the Kosarak datasets

Crossref

Univerzitní repozitář Masarykovy univerzity

Mining attribute evolution rules in dynamic attributed graphs

Author: B Bringmann
E Desmier
M Berlingerio
P Fournier-Viger
P Lenca
Z Cheng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

A dynamic attributed graph is a graph that changes over time and where each vertex is described using multiple continuous attributes. Such graphs are found in numerous domains, e.g., social network analysis. Several studies have been done on discovering patterns in dynamic attributed graphs to reveal how attribute(s) change over time. However, many algorithms restrict all attribute values in a pattern to follow the same trend (e.g. increase) and the set of vertices in a pattern to be fixed, while others consider that a single vertex may influence its neighbors. As a result, these algorithms are unable to find complex patterns that show the influence of multiple vertices on many other vertices in terms of several attributes and different trends. This paper addresses this issue by proposing to discover a novel type of patterns called attribute evolution rules (AER). These rules indicate how changes of attribute values of multiple vertices may influence those of others with a high confidence. An efficient algorithm named AER-Miner is proposed to find these rules. Experiments on real data show AER-Miner is efficient and that AERs can provide interesting insights about dynamic attributed graphs

Crossref

Research Commons@Waikato

Discovering High-Utility Itemsets at Multiple Abstraction Levels

Author: E Baralis
JC Lin
L Cagliero
P Fournier-Viger
S Krishnamoorthy
VS Tseng
VS Tseng
Y Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

High-Utility Itemset Mining (HUIM) is a relevant data mining task. The goal is to discover recurrent combinations of items characterized by high prot from transactional datasets. HUIM has a wide range of applications among which market basket analysis and service proling. Based on the observation that items can be clustered into domain-specic categories, a parallel research issue is generalized itemset mining. It entails generating correlations among data items at multiple abstraction levels. The extraction of multiple-level patterns affords new insights into the analyzed data from dierent viewpoints. This paper aims at discovering a novel pattern that combines the expressiveness of generalized and High-Utility itemsets. According to a user-defined taxonomy items are rst aggregated into semantically related categories. Then, a new type of pattern,namely the Generalized High-utility Itemset (GHUI), is extracted. It represents a combinations of items at different granularity levels characterized by high prot (utility). While protable combinations of item categories provide interesting high-level information, GHUIs at lower abstraction levels represent more specic correlationsamong protable items. A single-phase algorithm is proposed to efficiently discover utility itemsets at multiple abstraction levels. The experiments, which were performed on both real and synthetic data, demonstrate the effectiveness and usefulness of the proposed approach

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Learning Behavioral Representations of Human Mobility

Author: Bringman K.
Chandra D. K.
Damiani M.L.
Fournier-Viger P.
Fuglede B.
Güting R.H.
Le Q.
Li X.
McInnes L.
Quadri C.
Wang P.
Wang P.
Yan B.
Řehůřek R.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 11/09/2020
Field of study

In this paper, we investigate the suitability of state-of-the-art representation learning methods to the analysis of behavioral similarity of moving individuals, based on CDR trajectories. The core of the contribution is a novel methodological framework, mob2vec, centered on the combined use of a recent symbolic trajectory segmentation method for the removal of noise, a novel trajectory generalization method incorporating behavioral information, and an unsupervised technique for the learning of vector representations from sequential data. Mob2vec is the result of an empirical study conducted on real CDR data through an extensive experimentation. As a result, it is shown that mob2vec generates vector representations of CDR trajectories in low dimensional spaces which preserve the similarity of the mobility behavior of individuals.Comment: ACM SIGSPATIAL 2020: 28th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems.November 2020 Seattle, Washington, US

arXiv.org e-Print Archive

Crossref

BicSPAM: flexible biclustering using sequential patterns

Author: A Ben-Dor
A Califano
A Patrikainen
A Prelić
A Serin
A Tanay
AA Alizadeh
AR Donders
C Creighton
C Ding
C Tang
D Bozdağ
D Martin
DS Hochbaum
F Zhu
G Atluri
G Bebek
G Getz
G Pandey
GF Berriz
H Choi
H Toivonen
H Wang
J Bellay
J Han
J Ihmels
J Liu
J Liu
J Pei
J Wang
J Yang
JA Hartigan
K Sim
K Yip
L Lazzeroni
M Charrad
M de Souto
M Steinbach
MA Mahfouz
MJ Zaki
NR Mabroukeh
O Troyanskaya
P Carmona-Saez
P Fournier-Viger
Q Fang
Q Sheng
R Henriques
R Henriques
R Martinez
Rui Henriques
S Barkow
S Hochreiter
S Madeira
S Tavazoie
Sara C Madeira
SC Madeira
SS Young
T Calders
T Hellem
TR Golub
U Alon
X Yan
Y Huang
Y Okada
Y Okada
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref